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Abstract: Hoeffding has shown that tail bounds on the distribution for sampling from a finite population 

with replacement also apply to the corresponding cases of sampling without replacement. (A special case 

Cf^ I of this result is that binomial tail bounds apply to the corresponding hypergeometric tails.) We give a 



new proof of Hoeffding's result by constructing a martingale coupling between the sampling distributions. 
This construction is given by an explicit combinatorial procedure involving balls and urns. We then apply 
f~^ ' this construction to create martingale couplings between other pairs of sampling distributions, both without 

replacement and with "surreplacement" (that is, sampling in which not only is the sampled individual 
replaced, but some number of "copies" of that individual are added to the population). 



1. Introduction 

In 1963, Hocffding [H, Section 6, Theorem 4] proved the fohowing theorem. 

Theorem 1.1: (W. Hoeffding) Let the population C consist of N values ci, C2, . . . , cat. Let Xi, X2, . . . , Xn 
denote a random sample without replacement from C and let li , 1^2 , • • • , Yn denote a random sample with 
replacement from C. Let 5„ — Xi + X2 + ■ ■ ■ + Xn and T„ = Yi + Y2 + ■ ■ ■ + Yn. Then if the function 
/ : R — > R is convex, 

Ex[/(5„)] <Ex[/(T„)]. 

Our first goal in this paper is to give a new proof of Theorem 1.1. Our proof is based on a stochastic 
order relation. The most familiar stochastic order relation is that of stochastic domination, which we shall 
denote <i. Stochastic domination can be defined in several equivalent ways. Let S and T be real- valued 
random variables with finite expectations. Then S <i T ii S and T satisfy either of the following two 
equivalent conditions. 

(LI) There exists an increasing coupling between S and T (that is, there is a random variable {S,T) such 
that 5* has the same distribution as 5*, T has the same distribution as T, and S < T with probability 
one). 

(L2) For any function / : R — > R, if / is increasing (that is, ii x < y implies f{x) < f{y)), then Ex[/(S')] < 

Ex[/(r)]. 

(See for example Miiller and Stoyan [M, Chapter 1] or Shaked and Shathikumar [S2, Chapter 1].) 

The stochastic order relation that is of importance in our proof is that of convex domination, which we 
shall denote <c. Convex domination can also be define in several equivalent ways. Specifically, S <c T if S" 
and T satisfy either of the following two equivalent conditions. 

(C-1) There exists a martingale coupling between S and T (that is, there is a random variable (S, T) such 
that S has the same distribution as 5*, T has the same distribution as T, and {S, T) is a martingale; 
that is Ex[f I S"] = 5*). 

(C-2) For any function / : R ^ R, if / is convex, Ex[/(S')] < Ex[/(T)] . 

(See for example Miiller and Stoyan [M, Chapter 1] or Shaked and Shathikumar [S2, Chapter 3].) 

For our proof, we shall only need the implication (C-l)=>(C-2), which is easily proved as follows. If 
i? is a random variable, we shall write Fji{r) = Pr[i? < r] for the distribution function of R, so that 
Ex[i?] := J rdFii(r). We use the tower formula Ex[i?] = Ex[Ex[i? | S]\ for conditional expectations, Jensen's 



inequality /(Ex[i?]) < Ex[/(_R)] for convex /, and the fact that {S,T) is a martingale: 

Ex[/(r)] = Ex[/(T)] 

= Ex[Ex[/(f)|^]] 

^ J f {Ex[f \ S ^ s]) dF^i-s) 

= J fis)dF^{s) 

= Ex[/(5)] 
= Ex[/(5)]. 

The implication (C-l)=^(C-2) shows that Theorem 1.1 is a consequence of the following proposition, 
which will be proved in Section 2. 

Proposition 1.1: Let the population C consist of N values ci , C2 , . . . , cat . Let Xi , X^ , ■ ■ ■ , Xn denote a random 
sample without replacement from C and let Yi,Y2, . . . ,Y„ denote a random sample with replacement from 

C. Let Sn = Xi + X2 H h Xn and Tn = Yi + Y2 + ■ ■ ■ + Yn- Then there is a martingale coupling between 

Sn and Tn- 

Hoeffding used Theorem 1.1 to transfer bounds he had obtained for the tails of the distributions of 
sums of the independent random variables Yi to the corresponding tails for the dependent random variables 
Xi. This transfer is possible because tail bounds typically employ a convex function, such as a quadratic or 
exponential, to weight large deviations from the mean more heavily than small ones. 

We shall illustrate this transfer of bounds by showing how a bound on the tail of a binomially distributed 
random variable transfers to that of a hypergeometrically distributed random variable. In this case, we take 
N = a + b, ci = C2 = ■ ■ ■ = Ca ^ I and Ca+i = Ca+2 = ■ ■ ■ = Ca+b = (modeling an urn containing a red 
balls and b blue balls). Then Sn is hypergeometrically distributed (the number of red balls drawn in n draws 
without replacement), while r„ is binomially distributed (the number of red balls drawn in n draws with 
replacement, or the number of successes in n independent trials, each of which succeeds with probability 
p = a/{a + b)). 

For the bound on the tail of the distribution of Tn we shall use the well known method due to Chernoff 
[CI]. If i? is a random variable, we shall denote by Mfl(u) ~ Ex[e"^] — J e"'' dFii{r) the moment generating 
function of R. Chernoff 's bound on the upper tail of R is 



Pt[R>w]= / dFfl(r) 

J r>w 

• j e^'dFR(r) 



< e" 
-e-"-M«(«). 



Let r„ be binomially distributed, as the number of red balls drawn in n draws with replacement from an 
urn containing a red balls and b blue balls, or the number of successes in n independent trials, each of 
which succeeds with probability p = a/ {a + b). Since Mt„{u) = (pe" + 1 — p)", Chernoff's bound yields 
Pi'[T'„ > {p + q)n] < e~"(P+9)" (pe" + 1 — p)", and minimizing this bound over u yields 

-p-qY 
Pr[T„>(p + g)n]< I (^-) { -^^— ] • (1.1) 




We shall transfer the bound (1.1) to the corresponding tail of the corresponding hypergcomctric dis- 
tribution. Let Sn be hypergeometrically distributed, as the number of red balls drawn in n draws without 
replacement from an urn containing a red balls and b blue balls. The Chernoff bound on the upper tail of 
Sn is hard to evaluate exactly (because Ms^ {u) is a hypergeometric function, from which the distribution 
gets its name). But Theorem 1.1, with the convex function f{v) = e"", tells us that 

MsM = Ex[e"^"] < Ex[e"^"] ^ Mr^iu). 

Thus the Chernoff bound for T„ applies to 5„ as well, yielding 

Pr[.„ > (p + ,)n] < [{^J" (t^)"'") • (1-2) 

(Chvatal [C2] has given a proof of the bound (1.2) by direct manipulation of sums of binomial coefficients.) 

In Section 2 , we shall give our construction of the martingale coupling for the proof of Proposition 1.1. In 
Section 3, we shall apply our method to construct martingale couplings between other pairs of distributions 
arising from various instances of sampling from finite populations, without replacement, with replacement, 
and with "surreplacement" (that is, with the sampled value being replaced, together with one or more 
additional copies of that value). The results of this paper first appeared in the first author's bachelor's thesis 
[L]. 

2. Proof of Proposition 1.1 

We begin with two urns. The first urn, X, contains A^ balls, xi, . . . ,xn. Each of these balls is labeled 
with its number; that is, ball Xi is labelled i. Balls will be drawn from urn X without replacement. The 
second urn, y, contains N balls, j/i, . . . , i/n. Each of these balls is initially unlabeled but will eventually be 
assigned a label. Balls will be drawn from urn y with replacement. 

We now perform an infinite sequence of steps as follows. In the course of these steps we shall define a 
bijective map ^ : {1, . . . , N} — > {1, . . . , N} and a surjective map i] : {1,2,...} -H> {1, . . . , N}. At each step, 
we draw a ball from urn y. If the ball drawn is still unlabeled, we draw a ball from urn X, we assign the 
label of the ball drawn from urn X to the ball drawn from urn y, then replace the ball drawn from urn y 
in urn y. If the ball drawn from urn y has already been assigned a label, we simply replace it in urn y. 
Since, with probability one, every ball in urn y will eventually be drawn, every ball in urn y will eventually 
be assigned a label. 

We define ^(i) to be the label of the i-th ball drawn from urn X. Since every ball from X is eventually 
drawn, and balls are drawn from X without replacement, ^ is a permutation of {1, . . . ,N}. We define ri[i) 



to be the label assigned to the ball drawn from urn y at the i-th step (cither during the i-th step or at some 
previous step). Since each of the labels 1, . . . , iV is eventually assigned to one of the balls in urn y, rj maps 
{1,2,...} onto {1,..., TV}. 

The process just described creates a coupling between ^, which is uniformly distributed over all per- 
mutations of {1, . . . , N}, and rj, which is a sequence ?7(1), ??(2), ... of independent random variables, each 
uniformly distributed over {!,..., N}. 

Let ci, . . . ,c„ be real numbers. We shall define the random variables Xi, . . . ,Xj^ by Xi — c^jj-) for 
1 < i < N, and the random variables Yi,Y2, . . . by Yi — c.^(i) for i > 1. This definition creates a coupling 
between the sequence Xi, . . . ,Xjv, which is distributed as a random sample without replacement from the 
population ci , . . . , Cn , and the sequence Yi , y2 , • . • , which is distributed as a sequence of independent random 
samples with replacement from the same population. 

Let n be an integer in the range 1 < n < A^. We define Sn — Xi + ■ ■ ■ + Xn and T„ = li + ■ • ■ + y„. 
This definition creates a coupling between Sn which is distributed as the sum of a random sample of size n 
without replacement from the population ci, . . . ,c„, and r„, which is distributed as the sum of a random 
sample of size n with replacement from the same population. 

We shall now show that (5„,T„) is a martingale; that is, that 

Ex[r„ I 5„] = 5„. (2.1) 

If Sn = s, then c^m + • ■ • + c^(n) = Sj ^nd ^(1), . . . , ^{n) is equally likely to be any of the sequences satisfying 
this constraint. Since any permutation of such a sequence is again such a sequence, we have 

Ex[cj(i) I Sn = s] = s/n (2.2) 

for 1 < i < n. Now 

Ex[r„ I Sn - s] = Ex[yi I 5„ - s] + • . • + Ex[r„ | Sn = s] 

= Ex[c^(i) I S*,! = s] H 1- Ex[c^(„) I Sn = s]. (2.3) 

Since each T]{i) for 1 < i < n is equal to one of the ^(1), . . . ,^{n), each of the n terms in (2.3) is equal by 
(2.2) to s/n, and thus Ex[T„ | 5„ = s] = s. This completes the proof of (2.1), and shows that the coupling 
(5„,T„) is a martingale. 

3. Other Martingale Couplings 

In this section we shall construct martingale couplings for other pairs of probability distributions. (For 
these pairs, neither distribution has a simple moment generating function, so they do not facilitate the 
transfer of tail bounds in the same way as Proposition 1.1.) The first of these pairs compares samples 
without replacement from two populations, one of which is a "fc-fold multiplication" of the other (that is, 
contains k "copies" of each individual from the other population). 

Proposition 3.1: Let the population C consist of N values ci, C2, . . . , c^f. Let the population D = kC consist 
of kN values di^i — ■ ■ ■ — di^k — ci, • • • , rfjv,i ~ ■ ■ ■ — d^.k — cn- Let Xi, X2, . . . , Xn denote a random 
sample without replacement from C and let Yi,Y2, . . . ,Yn denote a random sample without replacement 



from D. Let 5„ = Xi + X2 + ■ ■ ■ + Xn and Tn = Yi + Y2 + ■ ■ ■ + Yn- Then there is a martingale couphng 
between S'„ and r„. 

Proof: We begin with two urns. The first urn, X^ contains N baUs, xi, . . . , x„. Each of these bahs is labeled 
with its number; that is, ball Xi is labelled i. Balls will be drawn from urn X without replacement. The 
second urn, y, contains kN balls, 2/1. ij • • • : yi,k, ■ ■ ■ i yN,ii ■ ■ ■ i VN.k- Each of these balls is initially unlabeled 
but will eventually be assigned a label. Balls will be drawn from urn y without replacement. For 1 < i < N, 
the balls j/m,i, • • • , Um.k will be said to comprise the m-th cohort. 

We now perform a sequence of kN steps as follows. In the course of these steps we shall define a bijective 
map ^ : {!,..., N} -^ {1, . . . , N} and a surjective map 77 : {1, ... , kN} — > {1, . . . , N}. At each step, we draw 
a ball from urn y. If the ball drawn is still unlabeled, we draw a ball from urn X, we assign the label of the 
ball drawn from urn X to the ball drawn from urn y and to the k — 1 other balls in its cohort. The ball 
drawn from urn y is not replaced, and the other balls in its cohort remain in the urn. If the ball drawn from 
urn y has already been assigned a label, we proceed to the next step. 

We define £,{i) to be the label of the i-th ball drawn from urn X. Since every ball from X is eventually 
drawn, and balls arc drawn from X without replacement, ^ is a permutation of {1, . . . ,N}. We define 77(1) 
to be the label assigned to the ball drawn from urn y at the i-th step (either during the i-th step or at some 
previous step). Since each of the labels 1, . . . , iV is eventually assigned to one of the balls in urn y, rj maps 
{l,...,fciV} onto {!,..., A^}. 

The process just described creates a coupling between ^, which is uniformly distributed over all permu- 
tations of {1, . . . , N}, and 77, which is uniformly distributed over maps f] : {1, . . . , fciV} such that r]{h) — j 
for exactly k values of h, for all 1 < j < A^. 

We shall define the random variables Xi, . . . ,Xn by Xi — c^fj) for 1 < i < A^, and the random 
variables Yi, . . . , YkN by Yi — c.f^(i) for 1 < i < fcA^. This definition creates a coupling between the sequence 
Ai, . . . , A^v, which is distributed as a random sample without replacement from the population ci, . . . , c^v, 
and the sequence Yi , . . . , YfcAr , which is distributed as a sequence of independent random samples without 
replacement from the population D. The proof this coupling is a martingale is exactly as in the proof of 
Proposition 1.1. D 

An obvious question left open by Proposition 3.1 is whether there is a martingale coupling between 
sampling without replacement from population kC and sampling without replacement from population k' C 
(where fc' > fc > 1, with k not dividing k'). 

Our final theorem concerns sampling with "surreplacement" , in which not only is each individual drawn 
from a population replaced, but some number of "copies" of that individual are added to the population. 

Proposition 3.2: Let the population C consist of A^ values ci, C2, . . . , cjv- Let Ai, A2, . . . , A„ denote a random 
sample without replacement from C and let Yi , ^2 , • • • , Y„ denote a random sample with surreplacement 
from C, whereby each individual drawn is replaced by a total of d > 1 copies of that individual. Let 
Sn = Ai + A2 + • • ■ + A„ and T„ = Yi + Y2 + • • • + Y„ . Then there is a martingale coupling between 5„ and 
r„. (The case d — 1 is simply the case of sampling with replacement, dealt with in Proposition 1.1.) 

Proof: We begin with two urns. The first urn, X, contains N balls, xi, . . . , x^r. Each of these balls is labeled 
with its number; that is, ball Xi is labelled i. Balls will be drawn from urn X without replacement. The 
second urn, y, contains A^ balls. Each of these balls is initially unlabeled but will eventually be assigned a 
label. Balls will be drawn from urn 3^ with surreplacement. 



We now perform an infinite sequence of steps as follows. In the course of these steps we shall define a 
bijective map £, : {1, . . . , N} — > {1, . . . , N} and a surjective map i] : {1,2,...} -^ {1, . . . , N}. At each step, 
we draw a ball from urn y. If the ball drawn is still unlabeled, we draw a ball from urn X, we assign the 
label of the ball drawn from urn X to the ball drawn from urn y, and to d — 1 new balls, then replace 
these d balls in urn y. If the ball drawn from urn y has already been assigned a label, we assign that 
label to d — I new balls, then replace these d balls in urn y. Let us consider a ball initially in urn y. The 
probability that it is not drawn in the first step is 1 — 1/N, the probability that it is not drawn on the 
second step is 1 — 1/{N + (d — 1)), and so forth, with the probability that it is not drawn on the i-th step 
being 1 — 1/(A^ + (* ^ ^){d — !)• Since the sum X]i>i l/(^ + (* ^ 1)('^ ^ 1) diverges to infinity, the product 
]^^^j^(l — 1/{N + (i — l)(d — 1)) diverges to zero. Thus, with probability one, every ball initially in urn y 
will eventually be drawn, so every ball initially in urn y will eventually be assigned a label. Of course, the 
balls added to y are assigned labels at the times they are added. 

We define ^(i) to be the label of the i-th ball drawn from urn X. Since every ball from X is eventually 
drawn, and balls are drawn from X without replacement, ^ is a permutation of {1, . . . ,N}. We define 77(1) 
to be the label assigned to the ball drawn from urn y at the i-th step (either during the z-th step or at some 
previous step). Since each of the labels 1, . . . , iV is eventually assigned to one of the balls in urn 3^, rj maps 
{1,2,...} onto {!,..., N}. 

The process just described creates a coupling between ^, which is uniformly distributed over all permu- 
tations of {1, . . . , N}, and 77, which is an sequence ?7(1), ??(2), ... of random variables, each distributed over 
{1, . . . , iV} in the way appropriate to surreplacement. Specifically, for i > 1, the conditional probability that 
ri{i) = j, given that ri{h) = j for exactly k values of ft, < i is (l + k{d — 1))/(A^ + (* ^ ^){d — 1)). 

Let ci, . . . ,c„ be real numbers. Wc shall define the random variables Xi, . . . ,Xn by Xi — c^(j\ for 
1 < i < A^, and the random variables Yi,Y2, . . . by Y^ = c^j^) for i > 1. This definition creates a coupling 
between the sequence Xi, . . . ,Xj^, which is distributed as a random sample without replacement from the 
population ci , . . . , cat , and the sequence Yi , I2 , • • • , which is distributed as a sequence of independent random 
samples with surreplacement from the same population. 

Let n be an integer in the range 1 < n < iV. We define Sn ~ Xi + ■ ■ ■ + A„ and r„ = Yi + ■ ■ ■ + Yn. 
This definition creates a coupling between S'„ which is distributed as the sum of a random sample of size n 
without replacement from the population ci, . . . ,c„, and r„, which is distributed as the sum of a random 
sample of size n with surreplacement from the same population. The proof this coupling is a martingale is 
exactly as in the proof of Proposition 1.1. D 

An obvious question left open by Proposition 3.2 is whether there is a martingale coupling between 
sampling with surreplacement of d copies and sampling with surreplacement of d' copies from the same 
population, where d' > d > 1. 
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